Thermal capacity management

ABSTRACT

Embodiments of the present disclosure generally relate to the field of thermal capacity management within data centers. In an embodiment, the present disclosure describes a method including using temperature measurements to provide real-time capacity usage information in a given data center and to use that information to perform moves/adds/changes with a particular level of confidence.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of, and claims the benefits ofpriority to, U.S. patent application Ser. No. 14/474,496, filed on Sep.2, 2014 (now allowed), and U.S. Provisional Patent Application No.61/873,632, filed on Sep. 4, 2013, which are incorporated herein byreference in their entireties.

FIELD OF INVENTION

Embodiment of the present invention generally relate to the field ofthermal capacity management within data centers, and more specifically,to methods and systems which provide feedback based on thermalinformation associated with parts of a data center.

BACKGROUND

Data centers are often designed with a projected capacity, which isusually more than twice the capacity utilized in the first day of itsoperation. Consequently, over time, equipment within the data centergets updated, replaced, and added as necessitated by the operationalneeds. Given the changes which take place during the life of a datacenter, it is important to be aware of what locations can be consideredsafe for new equipment. The safety factor is not only dictated by theavailable rack unit (RU) spaces within cabinets, the cabinet weightlimits, and power availability, but more importantly it is also dictatedby the available cooling (thermal) capacity at a given cabinet.

Various systems direct to thermal capacity management have beendeveloped. However, continued need by data center managers for new waysof evaluating thermal capacity of a data center and how electronicequipment impacts this capacity creates a need for new and improvedsystems and methods related to this field.

SUMMARY

Accordingly, at least some embodiment of the present invention aregenerally directed to systems and methods for providing feedbackinformation based on thermal and power variables.

In an embodiment, the present invention is a method comprising the stepsof using temperature measurements and power meter readings to provide areal-time capacity usage in a given data center.

In another embodiment, the present invention is a system for managingcooling capacity within a data center or within a subset of a datacenter, where the system includes at least one processor; and a computerreadable medium connected to the at least one processor. The computerreadable medium includes instructions for collecting information from aplurality of cabinets, the information including an inlet temperature, amaximum allowable cabinet temperature, and a supply air temperature,where the collected temperatures are used to calculate a value Theta foreach of the plurality of cabinets. The computer readable medium furtherincludes instructions for determining whether any of the calculatedTheta values indicates that any of the plurality of cabinets' inlettemperatures is at least one of below, at, and above the respectivemaximum allowable cabinet temperature. The computer readable mediumfurther includes instructions for determining whether, based on any ofthe calculated Theta values, performed cooling capacity management willsatisfy a user confidence level, and if the confidence level issatisfied, for distributing the remaining cooling capacity over at theplurality of cabinets.

In yet another embodiment, the present invention is a non-transitorycomputer readable storage medium including a sequence of instructionsstored thereon for causing a computer to execute a method for managingcooling capacity within a data center. The method includes collectingcabinet information from each of a plurality of cabinets, the cabinetinformation including an inlet temperature, a maximum allowable cabinettemperature, and a supply air temperature. The method also includescollecting a total power consumption for the plurality of cabinets. Themethod also includes collecting a total cooling capacity for theplurality of cabinets. The method also includes deriving a remainingcooling capacity for the plurality of cabinets. The method also includesfor each of the plurality of cabinets calculating a θ value, each of thecalculated θ values being calculated at least in part from therespective collected cabinet information. And the method also includesfor each of the calculated θ values determining whether any one of theplurality of cabinets' inlet temperatures is at least one of below, at,and above the respective maximum allowable cabinet temperatures, whereif any one of the inlet temperatures is at least one of at and above therespective maximum allowable cabinet temperatures, providing a firstalarm, and where if all of the inlet temperatures are below therespective maximum allowable cabinet temperatures, determining whethereach of the calculated θ values is at least one of below, at, and abovea user-defined θ value, where if any one of the calculated θ values isat least one of at and above the user-defined θ value, providing asecond alarm, and where if all of the calculated θ values are below theuser-defined θ value, distributing the remaining cooling capacity overthe plurality of cabinets.

These and other features, aspects, and advantages of the presentinvention will become better-understood with reference to the followingdrawings, description, and any claims that may follow.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a flow chart representative of an embodiment of thepresent invention.

FIG. 2 illustrates the correlation between a confidence percentage and aTheta value.

FIG. 3 illustrates an executed embodiment of the present invention.

DETAILED DESCRIPTION

Referring now to FIG. 1, this figure illustrates a flowchart showing thesteps performed in one embodiment of the present invention. In theinitial step 100, cabinet inlet temperatures T_(max,i) are obtained fromthe available cabinets within the data center. This can be achieved bymonitoring temperature sensors which are installed within the cabinets.While in instances where only one sensor is installed, only onetemperature reading can be obtained, in instances where multiple sensorsare installed in a cabinet, it is preferable to use the maximum recordedtemperature for the T_(max,i) value. Alternatively, average values maybe considered.

In the next step 105, power consumption values P_(i) are obtained fromthe available cabinets. One way of obtaining the necessary real-timepower readings is to collect power usage information from power outletunits (POUs) which are typically installed in data center cabinets. EachPOU provides a total power usage reading for the respective cabinet.Adding the available POU readings from each of the cabinets presentwithin a data center or within a subset of a data center provides thetotal power usage value ΣP_(i) for the respective data center or for arespective subset of that data center.

In the next step 110, the total cooling capacity of a data center or ofa subset of a data center is calculated. This can be done by usingmanufacturer-supplied data, such as the rated capacity of the coolingequipment within the data center. Using this data, the rated capacity ofthe cooling equipment within the data center or within a subset of adata center are summed together and are used to obtain the totalremaining cooling capacity available at the current specified coolingequipment set-point.

Having the total power usage, it is then possible to determine theremaining cooling capacity in step 115. To do this, the total powerusage ΣP_(i) calculated in step 105 is subtracted from the total coolingcapacity calculated in step 110. The resulting P_(cool) value.

Next, it is necessary to calculate a non-dimensional parameter Theta(θ). This parameter is computed for at least one cabinet, and preferablyfor every cabinet in a data center or a subset of a data center. Forevery cabinet, Theta is calculated using the maximum inlet cabinettemperature T_(max,i), maximum allowable temperature T_(Allowable), andthe supply air temperature T_(SAT) of the air being supplied by thecooling equipment, where θ is derived using the following equation:

$\theta = \frac{T_{\max,i} - T_{SAT}}{T_{Allowable} - T_{SAT}}$

The maximum allowable temperature T_(Allowable) can be obtained eitherfrom the manufacturer's specification or this value may be set to anyvalue deemed appropriate by the user. The supply air temperature T_(SAT)of the air being supplied by the cooling equipment can be obtained byway of measuring said temperature at or near the equipment supplying thecooling air or at any position before the cabinet that is deemed toprovide an accurate representation of the temperature of the air that isbeing supplied.

Theta can be described as the temperature gradient between the inlettemperature T_(max,i) of each cabinet and the supply air temperatureT_(SAT), with respect to a maximum allowable temperature T_(Allowable).A Theta value of zero indicates that the cabinet inlet temperature is atthe supply air temperature (no gradient). A Theta value of one indicatesthat the cabinet inlet temperature is at the allowable temperature, anda value above one indicates a cabinet inlet temperature above theallowable temperature.

As shown in step 125, the calculated Theta value is used to determinethe next course of action. If any one cabinet inlet temperature is at orabove a set allowable temperature (evidenced by a Theta value beingequal or greater than 1), the system determines that there is noadditional cooling capacity available on any of the cabinets until theissue of the inlet temperature being higher than the allowabletemperature is resolved to where the inlet temperature is lower than theallowable temperature. To notify the user of the potential risk ofoverheating, an alarm may be signaled to the user, as shown in in step130. This may be done in any number of suitable ways and can includeelectronic, visual, aural, or any other appropriate methods of delivery.In one embodiment, the user receives a message within data centermanagement software used to manage the data center where the messageprovides a map-like representation of the data center with any of theproblematic cabinets being highlighted a certain color. In a variationof this embodiment all the cabinets may be highlighted such that anycabinet having Theta≧1 appears red, any cabinet having 1>Theta>0 appearsyellow, and any cabinet having Theta=0 appears green. Once the user hasreceived an alarm, he may undertake the necessary action to remedy theproblem. As illustrated in step 135, the present invention may providethe user with potential ways to fix the issues causing the alarm. Thismay include, without limitations, suggestions to check the blankingpanels, add perforated tiles, and/or change the cooling unit set-point.

If all the cabinet inlet temperatures are below the allowabletemperature (evidenced by having all the calculated Theta values remainbelow 1), the present invention compares the calculated Theta valuesagainst a predefined θ_(user) value. The θ_(user) value corresponds to aspecific user-confidence percentage, and the predefined correlationbetween the two is derived through a number of Computational FluidDynamics (CFD) models that are representative of actual data centers (asexplained later in the specification). The plot in FIG. 2 shows theconfidence value in the cooling capacity management method used fordifferent theta values. If, for example, the user specified Theta(θ_(user)) is 0.3 or below, the cooling capacity management methodrepresented by the flow chart of FIG. 1 is likely to work 100% of thetime, keeping a safe thermal environment for the IT equipment. Howeverif the θ_(user) is 0.6, the cooling capacity management methodrepresented by the flow chart of FIG. 1 is likely to work 70% of thetime. Note that if and when a certain confidence level is selected, sucha level correlates to the highest possible value of θ_(user) that willstill correspond to the selected confidence level. Therefore, forexample, if a confidence level of 100% is selected, the θ_(user) valueused in the execution of the present invention will be 0.3 instead of0.1.

Thus, if in step 140 it is determined that the calculated Theta valuesfor a set of cabinets or all the cabinets within a data center fallbelow a predefined value θ_(user), the present invention distributes theremaining cooling capacity P_(cool) over said cabinets in step 145 andprovides the user with a confidence percentage that the executeddistribution will successfully work. If, however, any of the calculatedTheta values are equal to or greater than θ_(user), the presentinvention outputs an alarm (similar to the alarm of step 130) in step150. This alarm can signal to the user that the cooling capacitymanagement in accordance with the present invention would not achievethe sufficient confidence percentage.

Note that the predefined θ_(user) value can be set by the user by way ofselecting a desired confidence level, wherein based on the selectedconfidence level, the present invention determines the appropriateθ_(user) value. Thus, if the user had determined that the appropriateconfidence percentage was at no less than ˜85%, the present inventionwould translate that percentage into a θ_(user) value of 0.4 and usethat value in step 140.

As noted previously, the correlation between the θ_(user) value and theconfidence level is developed via a number of Computational FluidDynamics (CFD) models that are representative of real data centers. TheCFD models are ran for different conditions, changing a number of keyvariables such as: supply air temperature, cabinet power, and differenttypes of IT equipment. For each case, the CFD models are ran withdifferent air ratios (AR). In an embodiment, there ranges are from 0.8AR to 2 AR. Air ratio is defined as the ratio between the airflowsupplied by the cooling units and the total airflow required for the ITequipment.

For each CFD run, the maximum cabinet inlet temperatures are monitored.If a cabinet maximum inlet temperature exceeds a specified allowabletemperature, thermal capacity is not managed. If all cabinet inlettemperatures are below the allowable temperatures, capacity is managedby distributing the available cooling capacity among all the cabinetsequally. The model is then rerun using the new managed capacity fordifferent ARs. Theta is calculated per cabinet for the baseline run withthe minimum AR that provided safe cabinet inlet temperatures. Themaximum Theta value is used for the percent confidence value in thepresent invention.

This work is repeatedly done for the remaining CFD models at differentcases. The maximum Theta values are collected to provide the overallpercent confidence in the present invention. The percent confidence is away of providing the user with a barometer for confidence for theapproach used for capacity management among the cabinets, for a givenset of theta values in their data center.

An example of the how a system in accordance with the present inventionmay be used is shown in FIG. 3. This figure illustrates two data centerlayouts (one being the current layout and one being the projectedlayout) and provides a user input interface where the user may select aparticular confidence level. In the currently described embodiment theselection of the confidence level is done by way of a slider whichranges from “optimistic” to “conservative” with “conservative” beingmost confident and “optimistic” being least confident. However, aparticular confidence level may be inputted in any number of ways,including without limitation manual entry of a number or automatic entrybased on at least one other factor. Having the necessary temperaturevalues, the system calculates the maximum Theta value to be 0.05. Giventhat this value is below 1, the present invention proceeds to the nextstep without triggering an alarm. The 0.05 Theta value is then comparedto the θ_(user) value which is derived from the selected confidencelevel percentage. In the described embodiment, the selected confidencelevel percentage is ˜100%, which translates to a θ_(user) value of 0.3.Since the maximum Theta is not greater than or equal to θ_(user) value,the system proceeds, yet again without triggering an alarm, todistribute the remaining cooling capacity evenly over all the cabinetsunder consideration. In this case, the remaining cooling capacity isdistributed evenly, and thus each cabinet receives an additional 5.73 kwof cooling capacity. In alternate embodiments, alternate distributionschemes may be implemented.

Note that the mention of the “data center” should not be interpreted asreferring only to an entire data center, as it may refer only to asubset of a data center. Accordingly, references to a “data center”throughout this application and the claims may be understood to refer tothe entire data center and/or to a subset of a data center.

Embodiment of the present invention may be implemented using at leastone computer. At least some of the operations described above may becodified in computer readable instructions such that these operationsmay be executed by the computer. The computer may be a stationary device(e.g., a server) or a portable device (e.g., a laptop). The computerincludes a processor, memory, and one or more drives or storage devices.The storage devices and their associated computer storage media providestorage of computer readable instructions, data structures, programmodules and other non-transitory information for the computer. Storagedevices include any device capable of storing non-transitory data,information, or instructions, such as: a memory chip storage includingRAM, ROM, EEPROM, EPROM or any other type of flash memory device; amagnetic storage device including a hard or floppy disk, and magnetictape; optical storage devices such as a CD-ROM disc, a BD-ROM disc, anda BluRay™ disc; and holographic storage devices.

The computer may operate in a networked environment using logicalconnections to one or more remote computers, such as a remote computer.The remote computer may be a personal computer, a server, a router, anetwork PC, a peer device or other common network node, and may includemany if not all of the elements described above relative to computer.Networking environments are commonplace in offices, enterprise-widecomputer networks, intranets and the Internet. For example, in thesubject matter of the present application, a computer may comprise thesource machine from which data is being migrated, and the remotecomputer may comprise the destination machine. Note, however, thatsource and destination machines need not be connected by a network orany other means, but instead, data may be migrated via any media capableof being written by the source platform and read by the destinationplatform or platforms. When used in a LAN or WLAN networkingenvironment, a computer is connected to the LAN through a networkinterface or an adapter. When used in a WAN networking environment, acomputer typically includes a network interface card or other means forestablishing communications over the WAN to environments such as theInternet. It will be appreciated that other means of establishing acommunications link between the computers may be used.

Those having skill in the art will recognize that the state of the arthas progressed to the point where there is little distinction leftbetween hardware and software implementations of aspects of systems; theuse of hardware or software is generally (but not always, in that incertain contexts the choice between hardware and software can becomesignificant) a design choice representing cost vs. efficiency tradeoffs.Those having skill in the art will appreciate that there are variousvehicles by which processes and/or systems and/or other technologiesdescribed herein can be effected (e.g., hardware, software, and/orfirmware), and that the preferred vehicle will vary with the context inwhich the processes and/or systems and/or other technologies aredeployed. For example, if an implementer determines that speed andaccuracy are paramount, the implementer may opt for a mainly hardwareand/or firmware vehicle; alternatively, if flexibility is paramount, theimplementer may opt for a mainly software implementation; or, yet againalternatively, the implementer may opt for some combination of hardware,software, and/or firmware. Hence, there are several possible vehicles bywhich the processes and/or devices and/or other technologies describedherein may be effected, none of which is inherently superior to theother in that any vehicle to be utilized is a choice dependent upon thecontext in which the vehicle will be deployed and the specific concerns(e.g., speed, flexibility, or predictability) of the implementer, any ofwhich may vary. Those skilled in the art will recognize that opticalaspects of implementations will typically employ optically-orientedhardware, software, and or firmware.

Note that while this invention has been described in terms of severalembodiments, these embodiments are non-limiting (regardless of whetherthey have been labeled as exemplary or not), and there are alterations,permutations, and equivalents, which fall within the scope of thisinvention. Additionally, the described embodiments should not beinterpreted as mutually exclusive, and should instead be understood aspotentially combinable if such combinations are permissive. It shouldalso be noted that there are many alternative ways of implementing themethods and apparatuses of the present invention. It is thereforeintended that claims that may follow be interpreted as including allsuch alterations, permutations, and equivalents as fall within the truespirit and scope of the present invention.

We claim:
 1. A method for data center thermal capacity management,comprising: collecting cabinet information from a plurality of cabinetsin a data center, the cabinet information including at least an inlettemperature and a maximum allowable cabinet temperature for each of theplurality of cabinets; deriving a remaining cooling capacity for thedata center; for each cabinet among the plurality of cabinets,calculating a θ value using at least the collected cabinet informationfor the cabinet; determining that all of the inlet temperatures for theplurality of cabinets are below their respective maximum allowablecabinet temperatures, and in response, determining whether each of thecalculated θ values is below, at, or above a user-defined θ value; ifany one of the calculated θ values is at or above the user-defined θvalue, providing an alarm; and if all of the calculated θ values arebelow the user-defined θ value, distributing the derived remainingcooling capacity among the plurality of cabinets.
 2. The method of claim1, wherein distributing the derived remaining cooling capacity among theplurality of cabinets comprises: distributing the derived remainingcooling capacity evenly among the plurality of cabinets.
 3. The methodof claim 1, comprising: collecting a total power usage for the pluralityof cabinets.
 4. The method of claim 3, wherein collecting the totalpower usage for the plurality of cabinets comprises: collecting totalpower usage readings from each cabinet among the plurality of cabinets;and summing each of the collected total power usage readings for eachcabinet among the plurality of cabinets to obtain the total power usagefor the plurality of cabinets.
 5. The method of claim 4, whereincollecting the total power usage reading from a cabinet comprises:collecting the total power usage reading from a power outlet unitinstalled in the cabinet.
 6. The method of claim 3, comprising:calculating a total cooling capacity for the data center.
 7. The methodof claim 6, wherein calculating the total cooling capacity for the datacenter comprises: obtaining a rated capacity of cooling equipment withinthe data center; and summing the obtained rated capacities of thecooling equipment within the data center to obtain the total coolingcapacity for the data center.
 8. The method of claim 7, wherein derivingthe remaining cooling capacity for the data center comprises:subtracting the total power usage for the plurality of cabinets from thecalculated total cooling capacity for the data center.
 9. A method fordata center thermal capacity management, comprising: collecting an inlettemperature and a maximum allowable cabinet temperature for each cabinetamong a plurality of cabinets in a data center; for each cabinet amongthe plurality of cabinets, calculating a θ value using at least thecollected inlet temperature and maximum allowable cabinet temperaturefor the cabinet; determining whether any of the calculated θ values isgreater than or equal to 1; if any of the calculated θ values is greaterthan or equal to 1: providing a first alarm to a user of a potentialoverheating risk; and providing the user with suggestions for correctingthe overheating risk; if none of the calculated θ values is greater thanor equal to 1, determining whether any of the calculated θ values isgreater than or equal to a user-defined θ value; if any one of thecalculated θ values is greater than or equal to the user-defined θvalue, providing a second alarm to the user to adjust the user-defined θvalue; and if none of the calculated θ values is greater than or equalto the user-defined θ value, distributing any remaining cooling capacityamong the plurality of cabinets.
 10. The method of claim 9, whereincollecting an inlet temperature for a cabinet among a plurality ofcabinets in a data center comprises: recording temperatures from aplurality of temperature sensors installed in the cabinet; and selectinga maximum recorded temperature among the recorded temperatures as theinlet temperature for the cabinet.
 11. The method of claim 9, whereincollecting an inlet temperature for a cabinet among a plurality ofcabinets in a data center comprises: recording temperatures from aplurality of temperature sensors installed in the cabinet; and averagingthe recorded temperatures to obtain the inlet temperature for thecabinet.